51 research outputs found
Understanding Traffic Density from Large-Scale Web Camera Data
Understanding traffic density from large-scale web camera (webcam) videos is
a challenging problem because such videos have low spatial and temporal
resolution, high occlusion and large perspective. To deeply understand traffic
density, we explore both deep learning based and optimization based methods. To
avoid individual vehicle detection and tracking, both methods map the image
into vehicle density map, one based on rank constrained regression and the
other one based on fully convolution networks (FCN). The regression based
method learns different weights for different blocks in the image to increase
freedom degrees of weights and embed perspective information. The FCN based
method jointly estimates vehicle density map and vehicle count with a residual
learning framework to perform end-to-end dense prediction, allowing arbitrary
image resolution, and adapting to different vehicle scales and perspectives. We
analyze and compare both methods, and get insights from optimization based
method to improve deep model. Since existing datasets do not cover all the
challenges in our work, we collected and labelled a large-scale traffic video
dataset, containing 60 million frames from 212 webcams. Both methods are
extensively evaluated and compared on different counting tasks and datasets.
FCN based method significantly reduces the mean absolute error from 10.99 to
5.31 on the public dataset TRANCOS compared with the state-of-the-art baseline.Comment: Accepted by CVPR 2017. Preprint version was uploaded on
http://welcome.isr.tecnico.ulisboa.pt/publications/understanding-traffic-density-from-large-scale-web-camera-data
FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras
In this paper, we develop deep spatio-temporal neural networks to
sequentially count vehicles from low quality videos captured by city cameras
(citycams). Citycam videos have low resolution, low frame rate, high occlusion
and large perspective, making most existing methods lose their efficacy. To
overcome limitations of existing methods and incorporate the temporal
information of traffic video, we design a novel FCN-rLSTM network to jointly
estimate vehicle density and vehicle count by connecting fully convolutional
neural networks (FCN) with long short term memory networks (LSTM) in a residual
learning fashion. Such design leverages the strengths of FCN for pixel-level
prediction and the strengths of LSTM for learning complex temporal dynamics.
The residual learning connection reformulates the vehicle count regression as
learning residual functions with reference to the sum of densities in each
frame, which significantly accelerates the training of networks. To preserve
feature map resolution, we propose a Hyper-Atrous combination to integrate
atrous convolution in FCN and combine feature maps of different convolution
layers. FCN-rLSTM enables refined feature representation and a novel end-to-end
trainable mapping from pixels to vehicle count. We extensively evaluated the
proposed method on different counting tasks with three datasets, with
experimental results demonstrating their effectiveness and robustness. In
particular, FCN-rLSTM reduces the mean absolute error (MAE) from 5.31 to 4.21
on TRANCOS, and reduces the MAE from 2.74 to 1.53 on WebCamT. Training process
is accelerated by 5 times on average.Comment: Accepted by International Conference on Computer Vision (ICCV), 201
Margin-Based Few-Shot Class-Incremental Learning with Class-Level Overfitting Mitigation
Few-shot class-incremental learning (FSCIL) is designed to incrementally
recognize novel classes with only few training samples after the (pre-)training
on base classes with sufficient samples, which focuses on both base-class
performance and novel-class generalization. A well known modification to the
base-class training is to apply a margin to the base-class classification.
However, a dilemma exists that we can hardly achieve both good base-class
performance and novel-class generalization simultaneously by applying the
margin during the base-class training, which is still under explored. In this
paper, we study the cause of such dilemma for FSCIL. We first interpret this
dilemma as a class-level overfitting (CO) problem from the aspect of pattern
learning, and then find its cause lies in the easily-satisfied constraint of
learning margin-based patterns. Based on the analysis, we propose a novel
margin-based FSCIL method to mitigate the CO problem by providing the pattern
learning process with extra constraint from the margin-based patterns
themselves. Extensive experiments on CIFAR100, Caltech-USCD Birds-200-2011
(CUB200), and miniImageNet demonstrate that the proposed method effectively
mitigates the CO problem and achieves state-of-the-art performance
PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning
Contrastive Language-Image Pre-training (CLIP) has shown promising open-world
performance on 2D image tasks, while its transferred capacity on 3D point
clouds, i.e., PointCLIP, is still far from satisfactory. In this work, we
propose PointCLIP V2, a powerful 3D open-world learner, to fully unleash the
potential of CLIP on 3D point cloud data. First, we introduce a realistic shape
projection module to generate more realistic depth maps for CLIP's visual
encoder, which is quite efficient and narrows the domain gap between projected
point clouds with natural images. Second, we leverage large-scale language
models to automatically design a more descriptive 3D-semantic prompt for CLIP's
textual encoder, instead of the previous hand-crafted one. Without introducing
any training in 3D domains, our approach significantly surpasses PointCLIP by
+42.90%, +40.44%, and +28.75% accuracy on three datasets for zero-shot 3D
classification. Furthermore, PointCLIP V2 can be extended to few-shot
classification, zero-shot part segmentation, and zero-shot 3D object detection
in a simple manner, demonstrating our superior generalization ability for 3D
open-world learning. Code will be available at
https://github.com/yangyangyang127/PointCLIP_V2
- …